[Core][BUG-FIX] Fix distinct collect agg bug of un-merged initial collect.#7025
[Core][BUG-FIX] Fix distinct collect agg bug of un-merged initial collect.#7025LiangDai-Mars wants to merge 1 commit intoapache:masterfrom
Conversation
|
@JingsongLi We found that when we using distinct collect agg, if some key only has single array record which deduplicated elements, it will return the initial array without removing duplicates. Please review the fix PR since it change the default logic of merge function. |
JingsongLi
left a comment
There was a problem hiding this comment.
The reason why introducing ReducerMergeFunctionWrapper is for optimizing the wrapped {@link MergeFunction}. If there is only one input, the input will be stored and the inner merge function will not be called, thus saving some computing time.
After your fix, the optimization is useless.
|
So perhaps we need to let the wrapper itself know whether the MergeFunction it wraps needs to be forced to merge, which may be communicated through a method interface. |
So you mean we support a new method such as "forceMergeIntialValue" on org.apache.paimon.mergetree.compact.MergeFunctionWrapper? |
Purpose
Tests
To verify the fix, this PR enhances the integration test case CollectAggregationITCase.java in the paimon-flink-common module.
This test case effectively covers the problematic scenario and ensures the correctness of the fix.
API and Format
No API or format changes. This change only involves an internal logic correction and does not affect any external APIs, data storage formats, or configuration files.
Documentation
This change is an internal bug fix and does not require updates to user-facing documentation.